Distinctive Frequent Itemset Mining from Time Segmented Databases Using ZDD-Based Symbolic Processing

نویسندگان

  • Shin-ichi Minato
  • Takeaki Uno
چکیده

(Abstract) Frequent itemset mining is one of the fundamental techniques for data mining and knowledge discovery. Recently, Minato et al. proposed a fast algorithm " LCM over ZDDs " for generating very large-scale frequent itemsets using Zero-suppressed BDDs (ZDDs), a compact graph-based data structure. Their method is based on LCM algorithm , one of the most efficient state-of-the-art techniques for itemset mining, and directly generates compact output data structures on the main memory, to be efficiently post-processed by using ZDD-based algebraic operations. In this paper, we propose a novel method of finding distinctive frequent itemsets from time segmented (e.g. daily, weekly, monthly) sequential transaction databases. We define " frequency pattern chart " using regular expressions for specifying distinctive frequency patterns in time segmented databases. Our method efficiently extracts all itemsets satisfying a given frequency pattern chart using LCM over ZDDs algorithm and ZDD-based symbolic processing of finite automata. Experimental results show that our method is applicable to very large-scale problems, for example, we can find a small number of distinctive itemsets from a huge number (more than 10 44) of frequent itemsets in a few seconds. Time segmented databases often appear in many real-life problems, so our new method will have a significant impact to various practical applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MaRFI: Maximal Regular Frequent Itemset Mining using a pair of Transaction-ids

Frequent pattern mining is the fundamental and most dominant research area in data mining. Maximal frequent patterns are one of the compact representations of frequent itemsets. There is more number of algorithms to find maximal frequent patterns that are suitable for mining transactional databases. Users not only interested in occurrence frequency but may be interested on frequent patterns tha...

متن کامل

A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURI

Classical frequent itemset mining identifies frequent itemsets in transaction databases using only frequency of item occurrences, without considering utility of items. In many real world situations, utility of itemsets are based upon user’s perspective such as cost, profit or revenue and are of significant importance. Utility mining considers using utility factors in data mining tasks. Utility-...

متن کامل

Generating Frequent Patterns Through Intersection Between Transactions

the problem of frequent itemset mining is considered in this paper. One new technique proposed to generate frequent patterns in large databases without time-consuming candidate generation. This technique is based on focusing on transaction instead of concentrating on itemset. This algorithm based on take intersection between one transaction and others transaction and the maximum shared items be...

متن کامل

A Survey on Mining Algorithms

Data mining is a process that discover the knowledge or hidden pattern from large databases. In the large database using association rules throughfind meaningful relationship between large amount of itemsets and this itemset through create frequent itemset. Association rule mining is the most paramount application in the large database. Most of the Association rule mining algorithm are improved...

متن کامل

A Study of Differentially Private Frequent Itemset Mining

Frequent sets play an important role in many Data Mining tasks that try to search interesting patterns from databases, such as association rules, sequences, correlations, episodes, classifiers and clusters. FrequentItemsets Mining (FIM) is the most well-known techniques to extract knowledge from dataset. In this paper differential privacy aims to get means to increase the accuracy of queries fr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009